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Abstract: In a health management system, prognostics, which is an engineering discipline 
that predicts a system's future health, is an important aspect yet there is currently limited 
research in this field. In this paper, a hybrid approach for prognostics is proposed. The 
approach combines the least squares support vector regression (LSSVR) with the hidden 
Markov model (HMM). Features extracted from sensor signals are used to train HMMs, 
which represent different health levels. A LSSVR algorithm is used to predict the feature 
trends. The LSSVR training and prediction algorithms are modified by adding new data 
and deleting old data and the probabilities of the predicted features for each HMM are 
calculated based on forward or backward algorithms. Based on these probabilities, one can 
determine a system's future health state and estimate the remaining useful life (RUL). To 
evaluate the proposed approach, a test was carried out using bearing vibration signals. 
Simulation results show that the LSSVR/HMM approach can forecast faults long before 
they occur and can predict the RUL. Therefore, the LSSVR/HMM approach is very 
promising in the field of prognostics. 

Keywords: prognostics; least squares support vector regression; hidden Markov model; 
remaining useful life 
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1. Introduction 

Condition-based maintenance (CBM), which can increase system maintenance efficiency and 
reduce life cycle cost, is gaining popularity. Developing the capability of prognostics and remaining 
useful life (RUL) prediction can save great costs and improve the logistics support. Therefore, 
prognostics and RUL prediction can improve the functions of CBM [1,2]. 

At present, various health management systems are being gradually proposed and applied, such as 
health and usage monitoring system (HUMS) [3], integrated vehicle health management (IVHM) [4], 
and prognostics and health management (PHM) [5,6]. PHM, proposed in Joint Strike Fighter (JSF), 
represents the highest level of CBM. In these health management systems, prognostics is an important 
but difficult task. Currently, there is limited research in this field due to the fact that small fault 
symptoms and disturbances cause difficulties in carrying out studies. Therefore, this paper aims to 
contribute to the growing body of research in the area of prognostics technology. 

Prognostic approaches can be placed into three categories: physical model-based prognostics, 
evolutionary or trending model-based prognostics, and experience-based prognostics [7]. The physical 
model-based method has the highest accuracy, but it is often complicated to model precisely and the 
cost is high. Evolutionary or trending-based prognostics are data-driven and feature-based methods. 
They are considered effective and available prognostics approaches. The third category, 
experience-based prognostics, relies on statistical reliability. Some popular prognostic methods of the 
second category are the auto-regressive moving average (ARMA) model [8], the artificial neural 
network (ANN) [9], the Bayesian network [10], and the support vector regression (SVR) [11,12]. 
Among these methods, ANN is a widely-used method. The disadvantages of ANN are that the number 
of hidden layers is difficult to choose and that the calculation may fall into a local minimum. SVR is a 
machine-learning algorithm based on Vapnik-Chervonenkis (VC) theory and the structural risk 
minimization principle; therefore it avoids the disadvantages of ANN [11] and is a potential method in 
prognostics. There are also other extended SVR methods, such as least squares support vector regression 
(LSSVR) [13,14]. LSSVR proposed by Suykens [15,16] is often applied to predict non-linear 
time-series signals. In health management, LSSVR can be used to estimate the future health 
deterioration and can be extended for remaining useful life (RUL) prediction. There are a series of 
articles about the use of LSSVR in prognosis and monitoring, such as [17,18]. Although the LSSVR 
algorithm can predict the signal trends, it is difficult to assess the health status of a system for each 
stage. To solve this problem and make health assessments more clear, in this paper a hidden Markov 
model (HMM) method is introduced to describe health states during component degradation. The 
HMM technique, initially introduced by Baum [19], is a stochastic approach based on Markov chains. 
HMM has strong fault classification and status classification ability, therefore, the HMM method can 
be used in fault detection and diagnosis [20-22], but it cannot predict the future health status. 

In this paper, a hybrid fault prognosis method based on LSSVR and HMM is proposed. In this 
method, LSSVR is used to predict the fault features, while HMMs are used to describe health states. 
Fault processes can be divided into several stages and HMMs are built and trained for each stage. 
Suppose given an observation sequence, the probabilities of the observation for HMMs are used to 
determine which health state of the given observations the sequence belongs to. Combining these 
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HMMs with the future features predicted by LSSVR can provide an estimate of the remaining useful 
life (RUL) of a system. 

The remainder of the paper is organized as follows: Section 2 outlines the framework of the 
LSSVR/HMM prognostics approach. Section 3 presents the training and estimation algorithms of 
HMM. Section 4 introduces the prediction algorithm based on LSSVR. Section 5 provides an 
application and simulation results. Section 6 presents our conclusions. 

2. LSSVR/HMM Based Prognostics 

2.1. The Hidden Markov Model 

This aim of this section is to introduce how to use the HMM-based method to describe a fault 
process. The fault process of a machine has a certain time-span. To analyze the fault process, it can be 
divided into several health states. However, fault states are unobservable but hidden in some 
observable signals, such as vibration signals. The HMM-based method provides a way to detect 
the unobservable fault states, as HMM is a stochastic technique for classifying signals and 
modeling [23,24]. An HMM describes a double stochastic process. One is the Markov chain, which 
describes state transitions. Another is the stochastic process, which describes the relations between 
states and observations. Suppose the number of health states is N 9 which are hi, Ii2,. . ., h^. The last state 
hjv represents that the system fails. This state sequence is a Markov chain. The structure of an TV-state 
left-to-right HMM is commonly used in fault analysis as shown in Figure 1 . 

Figure 1. N-state left-to-right hidden Markov model. 




As shown in Figure 1, an HMM consists of a finite number of states where a# (ij = 1,2,. . .,N) is the 
transition probability from state h z to state h 7 . Observation sequences from different conditions can be 
used to train different HMMs. For instance, observations from normal condition can train an HMM for 
normal state. An HMM can be described as: 

A = (N,M,x,A,B) (1) 

where TV is the number of states, M is the number of observations for each state, n is the initial 
probability vector, A = (%)#x# is the probability transition matrix which describes the transition 
relations between the states, B = (bjk)N*M describes the observation probability distribution and bjk is 
the probability of the kth observation in state j. N and M are known based on history experiences or 
prior knowledge. In other words, they are assumptions before HMM training. 7zr, A, B are learned from 
the data. As N and M are also included in A and B, the model can be abbreviated as [23]: 
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X = {k,A,B) 



(2) 



As the observation sequence from a health state can train an HMM, N HMMs can be trained, which 
are written as fa, fa,..., In- h represents the HMM trained based on the given observation sequence 
from the ith health state. Given an observation sequence 0={o\, 02,..., ot) where T is the length of the 
sequence, P(0\Ai) is the likelihood probability of observation sequence O occurring at the condition of 
model fa, which can be used to determine whether O belongs to the health state h z . 

As mentioned above, using HMMs can describe machine fault processes. According to the 
likelihood probabilities of present observations, HMMs can detect and diagnose faults. However, the 
HMM method cannot predict future health states and the RUL, because the future observation 
sequence cannot be obtained. To solve this problem, a hybrid prognostics method based on LSSVR 
and HMM is proposed in this paper, where LSSVR provides a way to predict the future fault features. 

2.2. Framework of LSSVR/HMM Prognostics 

The framework of the hybrid fault prognosis scheme is described in Figure 2. In this scheme, the 
LSSVR algorithm is used to predict the future fault features and the HMMs are used to describe 
different health states. 

Figure 2. Framework of the LSSVR/HMM fault prognostics scheme which contains two 
processes. One is HMMs training (offline) and another is fault feature and RUL prediction 
process (on line). 
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As shown in Figure 2, the process of the hybrid LSSVR/HMM prognostic approach can be 
summarized in the following steps: 

(1) Feature extraction 

As most observation signals have much random noise and uncertain interferences, features should 
be extracted from the signals before fault diagnosis and prognosis. In this paper, the linear predictive 
(LP) method, as mentioned in article [25], is used to extract the features from vibration signals. In the 
LP method, the signal S n can be predicted as a linear combination of the P previous signal, which can 
be written as: 

P 

K ~ h S n-\ + h S n-2 ^ h lp S n-p ~ 2j h S n-i (3) 

i=\ 

where /i, h, ...,l p are linear prediction coefficients which can be used as feature vector of the signals. 
The residual errors are written as: 

P 

e n = S n~h = S n~Yj l i S n-i (4) 

The value of the linear prediction coefficients /i, h, l p can be calculated by minimizing the 
mean-square value of the residual errors over a signal window. The mean square value is written as: 

V m V m i=l 

where e n (m) = e (n + m), S n (m) = S(n + m), n is the starting point of the signal window, n + m is the 
terminal point of the signal window. 

Vibration signals are always non-stationary. If signals are segmented into several short-time signals, 
they can be considered stationary. Therefore, we can obtain short-time signals by segmenting vibration 
signals into small windows; and each window as a frame. To segment the vibration signals into small 
windows, the length of a small window should be determined. Each window has equal length. Based 
on the given length, the vibration signals can be divided into several small windows. And then the 
linear prediction coefficients can be extracted from these windows. The feature extraction is described 
in Figure 3. 

Each window is coded into a feature vector which is written as: 

o t =[I n ,l t2 ,.--l tp ] T , l<t<T (6) 

where o t is the feature vector of the /th signal window, l t \, la, l tp are the linear prediction 
coefficients of the tth signal window, T is the number of the windows. Then the observation sequence 
0={o\ 9 6>2, ot} is composed of these feature vectors, which will be used for HMM training or 
LSSVR prediction. 



Sensors 2013, 13 

Figure 3. Feature extraction by segmenting the vibration signals into small windows. 



5547 




o, o 2 o 3 ° T1 °t 

(2) HMMs training 

HMMs training can now be carried out based on the fault features history. In this paper, the historic 
fault features are experimental data. The training data used for a specific state come from the 
corresponding experimental conditions. The essence of this step is to estimate the HMM parameters 
for each state. Based on observation sequences from N different conditions, optimal HMMs k\ 9 fa . . fa 
are trained for corresponding health state. This step is an off-line process. 

(3) Prediction of features based on LSSVR 

Suppose the present time is t, and the feature vector after k time units is denoted o t+k which can be 
predicted based on the data before time t using the LSSVR algorithm. Based on the feature vector 
o t+k •> one can predict the system health state at the future time t + k. 

(4) Log-likelihood calculation 

Calculating likelihood probabilities P( o t+k \fa) according to forward or backward algorithms [26,27] 
and comparing all of these probabilities, one can find the maximum probability: 

i = index{rmx{P x , P 2 , . . . , P N )) (7) 

where P\,Pi, . . Pn are the likelihood probabilities for the corresponding HMMs. 

The corresponding state h z is considered the health state of the future time t + k. Therefore, this 
method can be used to predict future health states. If k = 0, the health state is the present state. 

(5) RUL prediction. 

As shown in Figure 2, the fault prognosis should run until the probability for the last state model is 
higher than for all other state models. Then the probability P( o t+k \ fa) for the last state indicates that 
the system will fail at the future time t + k. Therefore, the RUL is k time units. 

3. HMM Training 

HMM training is to obtain the optimal model and the training process needs to estimate the model 
parameters. An HMM is described in Equation (2). When the observations are discrete, they can be 
represented by distributed probabilities. However, the observations are always continuous in an actual 
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system. In this situation, a mixture of several Gaussian distributions is used to describe observation 
probabilities which are written as [26]: 

Q 

b j( 0 t) = T, c ji N (°t\Mji^ji) (8) 



/=i 



N(o t | //. /9 S . z ) = (9) 
v z;r J I ^ji I 

where 6/(0*) is the probability of observation o t at the condition of state h 7 , o t is the feature vector of the 
rth window, Q is the number of the mixture Gaussian components, Cji is the weight of the /th mixture 
component in state h 7 , and //// and 2)-/ are the mean and covariance of the Gaussian density. 
Then an HMM is re- written as: 

Z = (x,A, f i JI ,I. J1 ,c Jl ) (10) 

3.1. Forward-Backward Algorithm 

Given an observation sequence 0={o\, 02,... , oj } and a model \ how to calculate the probability 
P(0\X) is a basic and important problem. Both forward and backward algorithms can solve this problem. 

In a forward algorithm, define the forward variables as rji(t) = P(o\, 02, o h q t = h, < t < T, 
which is the probability of generating o\, 02, ...,o t and ending in state h/. q t represents the state attime t. 

The recursive estimation of the forward is as follows [26,27]: 

(a) Initialization: 

r Ji Q) = 7T i b i (Oi) (11) 



(b) Forward recursion (For /=1,2,. . .,T-1, 1 <j <N)\ 



N 



i=\ 



bj(o t+l ) (12) 



Vj {t + \) = 
(c) End: 

P{0\X) = f j i li (T) (13) 

In a backward algorithm, we define the backward variables as fii(t) = P(o t +u o t +i 9 ... 9 ot\ qt = h;, X), 
1 < £ < T, which is the probability of generating o t +\, o t+ 2,.. . , o T and ending in state h/. 
The recursive estimation of the backward is as follows [26,27]: 

(a) Initialization (1 < / < TV): 

$r(0 = l (14) 

(b) Backward recursion (For ^ = 7-1,7-2, 1, 1 <i<N): 

N 

A(0 = Z a A(^+i)A + iO") (15) 

7=1 

(c) End: 

TV 



P(O|A) = 2A(0 (16) 



1=1 
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3.2. Re-Estimation of the HMM Parameters 

The goal of HMM training is to find the model X* = argmax P(<9|A). That means the parameters of 
each HMM should be re-estimated until the convergence error achieves requirements. The Baum- Welch 
algorithm [26] provides a solution to this problem. 

Given an observation sequence 0, define y t (i) = P(q t = hi \0, X), which is the probability of that the 
state at time t is h,-. y t {i) can be calculated as: 

7,(0A(0 



^ajmu) ( 17 ) 



Define y t (i, T) as the probability of the 1th mixture Gaussian component when the state at time t is h ; . 
y t {i, I) can be calculated as: 



y t (V)=y t (i) 



lS m ^ im N{o t \n im ,i: im ) (is) 



it(U j) = P{q t = h i} q t+1 = hj \0 9 X) is defined as the probability of that the state at time t is h t and the 
state at time t + 1 is h 7 . j) can be calculated as: 

4(^7>^(0^/^ + i)A + iO')/^l X) (19) 

If there are K observation sequences and the length of each sequence is T, the parameters of a 
HMM can be estimated as follows [26,27]: 



*-,.-^L 



(20) 



K 



^^.rfffl <21) 



& = 



(22) 



ft= <23) 



^ = — — (24) 



According to Equations (20)-(24), the parameters of the HMMs are re-estimated. Finally, optimal 
models of the health states can be obtained as shown in Figure 2. Given an observation, the likelihood 
probabilities for each state can be calculated according to Equations (13) or (16). In the forward-backward 
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algorithm, tji(t) and fi t (i) will tend to be zero as time grows. To avoid this problem, logarithmic values 
of probabilities are usually used in the actual calculations. 

4. Feature Trend Prediction Based on LSSVR 



4.1. LSSVR Algorithm 

The idea of SVR is to map non-linear problems to linear problems in a high dimension space. 
LSSVR is an improved algorithm based on SVR, and it uses a least square loss function instead of the 
^-insensitive loss function [16,28]. Compared with SVR, the calculation procedure of LSSVR has 
higher efficiency. 

Considering the sample data as {(jcij^i), (ay/), (xdJd)}, where Xi is the input vector, and 
yi is the objective value. The optimization equation of the LSSVR is described as [13,14]: 



1 2 1 D 

min J(a),e) = — \\co\\ + — CVe? 



1=1 



s.t. y i =co T ^(x i ) + fc + e i \<i<D 



(25) 



where co is weight vector, e t is error variable, C is a regularization parameter used to control the 
punishment degree of samples beyond the error, the function of &(*) is to allow LSSVR to work in a 
linear space, and k is the bias term which is an unknown and needs to be estimated. 
The Lagrangian is written as: 



D 



L(a>, e, k, a) = J {to, e) - ^ a t (ft> T ^(x- ) + K+e i -y i ) 



(26) 



i=\ 



where a t is Lagrange multipliers. 

According to Karush-Kuhn- Tucker (KKT) optimization conditions, the optimality conditions are 
given as [13,14]: 



dco 



D 



— = - > a- = 0 
3* tt ' 



dL_ 

dL 



da, 



= Ce, - a, = 0 



= -(a) T ^(x i ) + K + e i -y i ) = 0 



(27) 



By solving Equation (27), the parameters a t and k can be obtained. Then the LSSVR model for 
estimation is written as: 

D 



i=\ 



(28) 



where K(x t x) is the kernel function which satisfies the Mercer condition: 
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K(x r x k ) = 0(x i ) T 0(x k ) 



(29) 



Frequently used kernel functions are the polynomial, sigmoid and radial basis kernel (RBF) 
functions. This paper applies RBF because it can classify multi-dimensional data. The RBF kernel 
function is given as: 



K(x-x k ) = exp(- 1 - 



-) 



(30) 

In fault prognosis, time series are the extracted feature data. Suppose the front r data have been 
obtained, then the object is to predict the future data using the front r data. The input vector and the 
output value can be constructed as: 







X 2 


- * P ~ 




X p+l 




x 2 


x 3 • 


X p+l 


, Y= 


X p+2 




_ X r-p 


X r-p+l 


X r-\ 




_ X r _ 



where p is the number of embedded dimensions. According to above LSSVR method, training this 
sample data can estimate the parameters a t and k. Then the prediction function can be written as: 

D 



X r + k=Ts a i K ( X i'X"> + K 



i=l 



(31) 



where k is the prediction steps, x t is the z'th support vector, and x is x = {x r +^ . . . ., x r +i , . . . , x r +k_i } . 

As mentioned above, LSSVR can predict future fault features. However, it is difficult to describe 
and assess health states. Combining LSSVR with HMM, it can realize health state prediction and RUL 
prediction. The feature vectors predicted based on the LSSVR are used as observations in each HMM 
to calculate the log-likelihood probabilities as shown in Equation (13) or Equation (16). The model 
with the highest likelihood probability represents the system health state at the corresponding future time. 

4.2. Algorithm Improvement 

As analyzed above, two parameters C and ain the LSSVR algorithm need to be optimized. 
Selecting superior parameters can improve the training speed and the performance of LSSVR. Genetic 
algorithms (GA) [13] and particle swarm algorithms [14] are two optimization methods used to 
determine the LSSVR parameters. In this paper, particle swarm optimization (PSO) is used. PSO 
realizes optimization by searching the partial and global optimal solution in a particle swarm. Define 
the fitness function of the zth particle as: 

1 



J i (C^) = 



Ya(yk-f( x k)f+ e P s 



k=\ 



(32) 



where eps is a small value to avoid the denominator to be zero. 

T T 

The particle positions and velocities are written as X t =(xu, Xi2,... 9 Xio) and Vj=(vu, Vi2,... 9 v*d) 
(/=1,2, . . ., m). m is the number of the particles. The partial best position is written as P t and the global 
best position is written as P g . Then the velocity and position updating can be written as: 
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v id (k + l) = Iv id (k) + c xh (k)(P id (k) - x id (k)) + c 2 r 2 (k)(P gd (k) - x id (*)) 
x id (k + l) = x id (k) + v id (k + l) 



(33) 
(34) 



where v ic i(k) and x ic i(k) are the d\h dimension velocity and position of the zth partial in the Ml iteration, 
/ is an inertia factor, c\ and C2 are study factor, and P g d are the dth dimension partial and global 
best position in the kth iteration, n and r2 are random during [0,1]. The optimization process is 
described as Figure 4, where the initial positions and velocities are given randomly among the range of 
the particle swarm. 

Figure 4. The parameters optimization process of the LSSVR based the PSO algorithm. 
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In Figure 4, the partial and global best position searching process is as follows: 



if J i> J best> then Pi=Xi> 
if J^G^t, then P g =X i9 



^ best - Ji 



(35) 



where Jbest is the partial best fitness and Gb es t is the global best fitness. 

The end condition is meeting the accuracy requirements or arriving at the maximum iterations. When 
meeting the end condition, the best position is considered as the optimal parameters used in LSSVR. 

As time goes by, new observations are added which may include fault information and can improve 
prediction accuracy. However, the training data will become large with time growth. If the number of 
sample data is too large, it will affect training efficiency. To solve this problem, some unimportant old 
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data, such as non-support vector samples or far history samples, are selected for deletion in this paper. 
These data will not affect the training accuracy seriously. By adding new data and deleting some old 
data, the training samples sequence will be renewed online with time growth. The new training sample 
can be written as: 



X(t + l) = X(t) + X new (t + l)-X del (t) 



(36) 



where X(t) is the training sample at time t, X new (t+l) is the added new sample at time t + \ 9 X^\{t) is the 
deleted sample chosen fromX(£). 

The RUL prediction process based on the improved LSSVR algorithm is summarized in Figure 5. 

Figure 5. RUL prediction process based on the improved LSSVR with data renewed online. 
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As shown in Figure 5, at each time t, RUL equals to the future prediction time whose corresponding 
state is the last health state. The prediction accuracy of RUL increases as time increases because new 
observations are added and some unimportant old data are deleted, which makes the feature trend 
predictions more accurate. 

5. Simulations and Results 

To evaluate the performance of the LSSVR/HMM prognostic approach, it was tested using 
experimental bearing vibration data from the Case Western Reserve University Bearing Data Center [29]. 
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The bearing test stand consists of a 2 hp motor, a torque transducer/encoder, a dynamometer, and 
control electronics. The test bearings support the motor shaft. The details of the test stand are shown in 
the website [29]. The actual test conditions of the motor as well as the bearing fault status have been 
carefully documented for each experiment. Different levels of faults in diameter were collected 
separately for the inner raceway, rolling element (i.e., ball) and outer raceway. Faulted bearings were 
reinstalled into the test motor and vibration data was recorded. 

In our simulation, the inner race vibrations signals are used. Its fault diameter includes the 
following levels: 0.007 inches, 0.014 inches, and 0.021 inches. The data was sampled at 12 kHz. 
Added to the normal states, fault process can be divided into four health states. The bearing data center 
provides vibration data from different health states. Data from different conditions are discrete, which 
can be used to train HMMs for the corresponding health status. 

5.1. Feature Extraction 

Vibration data are segmented into several windows and extracted feature vectors as mentioned in 
Section 2.2 and Figure 3. In the simulation, 256 sample data were chosen as a frame. Based on the 
linear predictive method, signals in each window can be predicted as a linear combination of its 
p previous signal, which is shown in Equation (3). In the simulation, we take P as 8. Then the feature 
vector of each signal window is o t = [l t \, hi, /^] T and the number of the feature dimensions is 8. 
Take the first dimension as an example; the features of the four health states are shown in Figure 6. As 
shown in Figure 6, the extracted feature for each health state is different. This demonstrates that these 
features can be used to distinguish different fault states. 

Figure 6. The first dimension of the feature vectors for each health states. 
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5.2. HMMs Training and Fault Diagnosis 

Suppose three Gaussian distributions are used to describe observation probabilities for each state. 
Give the initial HMM parameters as follows: 

;r = [l,0,0,0] T 



A = 
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Generally speaking, the influence of the initial values of n and A is not large. But the initial values 
of [jiju Uju and Cji may affect the training result. They can be calculated using K-means method based on 
the given samples, which are not discussed here in detail. The convergence error is given as 0.00001. 

To improve the training accuracy, several observation sequences are used to train HMM for a health 
state. The log-likelihood probabilities of HMMs training are shown in Figure 7. 

Figure 7. Log-likelihoods of HMMs during training for the four states. 
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The simulation results shown in Figure 7 illustrate that the log-likelihoods tend to stabilize with the 
growth of the number of epochs. All of the training epochs are less than 40. These trained HMMs can 
be used to diagnose faults. In this paper, two vibration data sequences are specimens for fault diagnosis 
testing, where one is fault-free data and another is data with a fault diameter of 0.007 inches. The 
log-likelihood probabilities of the test vibration data with fault-free are shown in Figure 8, while the 
log-likelihood probabilities of the data with fault diameter of 0.007 inches are shown in Figure 9. 
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Figure 8. Log-likelihoods for the four state HMMs of the fault- free test data. 
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As shown in Figure 8, the HMM with the highest probability is health state 1, which represents 
fault-free state. That is to say the test data are fault-free. The result is consistent with the given data. 
Figure 9 shows that the HMM with the highest probability is health state 2, which represents the fault 
diameter of 0.007 inches. Therefore, the diagnosis result is consistent with the given data. According to 
the above two tests, fault diagnosis based on HMMs is efficient. 

Figure 9. Log-likelihoods for the four state HMMs of the test data. 
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5.3. Features and RUL Prediction 

Collecting lifetime data is a difficult work and the bearing data center does not provide this lifetime 
data. Therefore, we construct a lif time data sequence by taking some vibration data separately from 
each health states. Then the constructed lifetime data sequence is used to test for features prediction 
and fault prognostics. After frame segmentation and feature extraction, the length of the feature sequence 
is 71. We take the front 50 feature vectors as LSSVR training data, and the back 21 as unknown test data. 
With these parameters, the present time is t = 50 and the future prediction time is k = 21. We take the 
first dimension of the feature vectors as an example, and the prediction result is shown in Figure 10. 

Figure 10. The prediction result of the first dimension of the feature vectors based on 
the LSSVR. 
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In Figure 10, the data before t = 50 are regression estimation results and the data after t = 50 are 
prediction results according to Equation (28). The simulation results indicate that the prediction feature 
trend is essentially consistent with the actual trend. The simulation results can therefore be used to 
make a fault prognosis. 

Based on the trained four HMMs, the log-likelihood probabilities of the life time data for each state 
are shown in Figure 1 1 . 

The state with the highest log-likelihood is the health state of the corresponding time. As shown in 
Figure 11, the system goes from health state 1 to health state 4 during the lifetime. The system is in a 
normal state before t = 38. During the time from t = 39 to t = 55, the system is in health state 2 which 
means the fault diameter is at 0.007 inches. During the time from t = 56 to t = 65, the system is in 
health state 3, which means the fault diameter is at 0.014 inches. After t = 66, the system is in health 
state 4. That indicates that the system fails after t = 66. RUL values can be predicted at each time. The 
RUL prediction results at different times are shown in Table 1 . 
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Table 1. Remaining useful life prediction results. 
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In Table 1, the actual value of RUL is the time between the present time and the failure time t = 66. 
The predicted value of RUL is calculated based on the proposed LSSVR/HMM method. As shown in 
Table 1, the predicted RUL is less than the actual RUL. Therefore, the LSSVR/HMM hybrid prognosis 
method can forecast faults long before the actual fault occurs. The RUL prediction errors reduce with 
increasing time and it tends to be 1 time unit in the simulation, but in real situations, the prediction 
results may be not so good because the test lifetime data used in the simulation is perfect and shorter 
than an actual lifetime data sequence. 

6. Conclusions 

Fault prognosis is an essential and difficult technology in health management and CBM. In this 
paper, a hybrid prognostics approach which integrates the HMM method and LSSVR is presented. In 
the proposed LSSVR/HMM approach, the fault process is divided into several health states modeled 
by HMMs. These HMMs are trained based on history sample data. The lot-likelihood probabilities of 
the HMMs are used to classify data and determine the health state. The state with the highest 
log-likelihood is the corresponding health state of given data. LSSVR is used to predict the feature 
trend. Based on the predicted features and trained HMMs, the future health state can be predicted. 
LSSVR trains and predicts RULs online until the highest probability of the predicted feature vector is 
at the last health state. Then the RUL is equal to the prediction time. 
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Testing of the proposed LSSVR/HMM prognostics approach was carried out using bearing 
vibration data. The simulation results showed that the novel approach is efficient in fault diagnosis and 
prognostics and can also predict the RUL. Furthermore, the results showed that the RUL prediction 
accuracy increases with the passing of time. Future work will focus on how to improve the prediction 
accuracy and expand the applications. 
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